Overview

Dataset statistics

Number of variables18
Number of observations824
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory116.0 KiB
Average record size in memory144.2 B

Variable types

CAT10
NUM7
BOOL1

Reproduction

Analysis started2020-06-24 05:59:22.655732
Analysis finished2020-06-24 05:59:30.717229
Duration8.06 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

cluster has constant value "1" Constant
df_index has unique values Unique

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct count824
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20221.21359223301
Minimum37
Maximum41164
Zeros0
Zeros (%)0.0%
Memory size6.4 KiB

Quantile statistics

Minimum37
5-th percentile2130.3
Q111112.5
median19165.5
Q329411.25
95-th percentile38966
Maximum41164
Range41127
Interquartile range (IQR)18298.75

Descriptive statistics

Standard deviation11449.16203
Coefficient of variation (CV)0.5661955932
Kurtosis-1.071785952
Mean20221.21359
Median Absolute Deviation (MAD)9083.5
Skewness0.08129332794
Sum16662280
Variance131083311.1
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1616610.1%
 
1297910.1%
 
1006110.1%
 
276310.1%
 
3348110.1%
 
2235510.1%
 
4054310.1%
 
275510.1%
 
176610.1%
 
2322810.1%
 
Other values (814)81498.8%
 
ValueCountFrequency (%) 
3710.1%
 
7510.1%
 
8810.1%
 
16410.1%
 
19910.1%
 
ValueCountFrequency (%) 
4116410.1%
 
4112310.1%
 
4112110.1%
 
4097010.1%
 
4088010.1%
 

age
Real number (ℝ≥0)

Distinct count54
Unique (%)6.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean39.60800970873787
Minimum19
Maximum92
Zeros0
Zeros (%)0.0%
Memory size6.4 KiB

Quantile statistics

Minimum19
5-th percentile26
Q132
median38
Q346
95-th percentile57
Maximum92
Range73
Interquartile range (IQR)14

Descriptive statistics

Standard deviation10.29022275
Coefficient of variation (CV)0.259801561
Kurtosis0.7273298942
Mean39.60800971
Median Absolute Deviation (MAD)7
Skewness0.7771791336
Sum32637
Variance105.8886842
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
32475.7%
 
34435.2%
 
31394.7%
 
36374.5%
 
33344.1%
 
30344.1%
 
43334.0%
 
37303.6%
 
29303.6%
 
35293.5%
 
Other values (44)46856.8%
 
ValueCountFrequency (%) 
1920.2%
 
2020.2%
 
2110.1%
 
2240.5%
 
2370.8%
 
ValueCountFrequency (%) 
9210.1%
 
7810.1%
 
7610.1%
 
7420.2%
 
7120.2%
 

job
Categorical

Distinct count12
Unique (%)1.5%
Missing0
Missing (%)0.0%
Memory size6.4 KiB
admin.
210
blue-collar
207
technician
126
services
67
management
51
Other values (7)
163
ValueCountFrequency (%) 
admin.21025.5%
 
blue-collar20725.1%
 
technician12615.3%
 
services678.1%
 
management516.2%
 
self-employed364.4%
 
retired354.2%
 
entrepreneur323.9%
 
unemployed182.2%
 
housemaid182.2%
 
Other values (2)242.9%
 

Length

Max length13
Median length10
Mean length9.041262136
Min length6

marital
Categorical

Distinct count4
Unique (%)0.5%
Missing0
Missing (%)0.0%
Memory size6.4 KiB
married
492
single
249
divorced
 
81
unknown
 
2
ValueCountFrequency (%) 
married49259.7%
 
single24930.2%
 
divorced819.8%
 
unknown20.2%
 

Length

Max length8
Median length7
Mean length6.796116505
Min length6

education
Categorical

Distinct count8
Unique (%)1.0%
Missing0
Missing (%)0.0%
Memory size6.4 KiB
university.degree
238
high.school
189
basic.9y
127
professional.course
93
basic.4y
83
Other values (3)
94
ValueCountFrequency (%) 
university.degree23828.9%
 
high.school18922.9%
 
basic.9y12715.4%
 
professional.course9311.3%
 
basic.4y8310.1%
 
basic.6y587.0%
 
unknown354.2%
 
illiterate10.1%
 

Length

Max length19
Median length11
Mean length12.48907767
Min length7

default
Categorical

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.4 KiB
no
664
unknown
160
ValueCountFrequency (%) 
no66480.6%
 
unknown16019.4%
 

Length

Max length7
Median length2
Mean length2.970873786
Min length2

housing
Categorical

Distinct count3
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size6.4 KiB
yes
417
no
386
unknown
 
21
ValueCountFrequency (%) 
yes41750.6%
 
no38646.8%
 
unknown212.5%
 

Length

Max length7
Median length3
Mean length2.633495146
Min length2

loan
Categorical

Distinct count3
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size6.4 KiB
no
659
yes
144
unknown
 
21
ValueCountFrequency (%) 
no65980.0%
 
yes14417.5%
 
unknown212.5%
 

Length

Max length7
Median length2
Mean length2.302184466
Min length2

comm_type
Categorical

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.4 KiB
cellular
547
telephone
277
ValueCountFrequency (%) 
cellular54766.4%
 
telephone27733.6%
 

Length

Max length9
Median length8
Mean length8.336165049
Min length8

month
Categorical

Distinct count10
Unique (%)1.2%
Missing0
Missing (%)0.0%
Memory size6.4 KiB
may
245
jul
186
aug
110
jun
101
nov
80
Other values (5)
102
ValueCountFrequency (%) 
may24529.7%
 
jul18622.6%
 
aug11013.3%
 
jun10112.3%
 
nov809.7%
 
apr627.5%
 
oct172.1%
 
sep131.6%
 
dec60.7%
 
mar40.5%
 

Length

Max length3
Median length3
Mean length3
Min length3

day_of_week
Categorical

Distinct count5
Unique (%)0.6%
Missing0
Missing (%)0.0%
Memory size6.4 KiB
thu
186
fri
171
wed
167
tue
157
mon
143
ValueCountFrequency (%) 
thu18622.6%
 
fri17120.8%
 
wed16720.3%
 
tue15719.1%
 
mon14317.4%
 

Length

Max length3
Median length3
Mean length3
Min length3

last_contact_duration
Real number (ℝ≥0)

Distinct count500
Unique (%)60.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1406.2087378640776
Minimum1053
Maximum4918
Zeros0
Zeros (%)0.0%
Memory size6.4 KiB

Quantile statistics

Minimum1053
5-th percentile1068.3
Q11142.75
median1271.5
Q31500.5
95-th percentile2177.25
Maximum4918
Range3865
Interquartile range (IQR)357.75

Descriptive statistics

Standard deviation432.513432
Coefficient of variation (CV)0.3075741321
Kurtosis13.30726681
Mean1406.208738
Median Absolute Deviation (MAD)153.5
Skewness3.065087859
Sum1158716
Variance187067.8689
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
110660.7%
 
115650.6%
 
112050.6%
 
113050.6%
 
106350.6%
 
120650.6%
 
108050.6%
 
116150.6%
 
108140.5%
 
121040.5%
 
Other values (490)77594.1%
 
ValueCountFrequency (%) 
105310.1%
 
105410.1%
 
105520.2%
 
105630.4%
 
105720.2%
 
ValueCountFrequency (%) 
491810.1%
 
419910.1%
 
378510.1%
 
364310.1%
 
363110.1%
 

campaign_contact_count
Real number (ℝ≥0)

Distinct count18
Unique (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.591019417475728
Minimum1
Maximum26
Zeros0
Zeros (%)0.0%
Memory size6.4 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q33
95-th percentile6
Maximum26
Range25
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.296696752
Coefficient of variation (CV)0.8864066153
Kurtosis22.88968255
Mean2.591019417
Median Absolute Deviation (MAD)1
Skewness3.782941169
Sum2135
Variance5.27481597
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
128034.0%
 
225130.5%
 
313916.9%
 
4678.1%
 
5273.3%
 
6192.3%
 
7121.5%
 
1060.7%
 
960.7%
 
840.5%
 
Other values (8)131.6%
 
ValueCountFrequency (%) 
128034.0%
 
225130.5%
 
313916.9%
 
4678.1%
 
5273.3%
 
ValueCountFrequency (%) 
2610.1%
 
1910.1%
 
1720.2%
 
1510.1%
 
1410.1%
 

poutcome
Categorical

Distinct count3
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size6.4 KiB
nonexistent
736
failure
 
62
success
 
26
ValueCountFrequency (%) 
nonexistent73689.3%
 
failure627.5%
 
success263.2%
 

Length

Max length11
Median length11
Mean length10.57281553
Min length7

cons.price.idx
Real number (ℝ≥0)

Distinct count25
Unique (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean93.60324999999999
Minimum92.20100000000001
Maximum94.76700000000001
Zeros0
Zeros (%)0.0%
Memory size6.4 KiB

Quantile statistics

Minimum92.201
5-th percentile92.843
Q193.075
median93.918
Q393.994
95-th percentile94.465
Maximum94.767
Range2.566
Interquartile range (IQR)0.919

Descriptive statistics

Standard deviation0.5638976068
Coefficient of variation (CV)0.006024337903
Kurtosis-0.7758244425
Mean93.60325
Median Absolute Deviation (MAD)0.474
Skewness-0.305689062
Sum77129.078
Variance0.3179805109
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
93.91818122.0%
 
93.99413716.6%
 
92.89310512.7%
 
93.4449211.2%
 
94.4658710.6%
 
93.2718.6%
 
93.075576.9%
 
92.201111.3%
 
92.431111.3%
 
92.963101.2%
 
Other values (15)627.5%
 
ValueCountFrequency (%) 
92.201111.3%
 
92.37950.6%
 
92.431111.3%
 
92.46920.2%
 
92.64960.7%
 
ValueCountFrequency (%) 
94.76730.4%
 
94.60120.2%
 
94.4658710.6%
 
94.21530.4%
 
94.19981.0%
 

cons.conf.idx
Real number (ℝ)

Distinct count25
Unique (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-40.74526699029127
Minimum-50.8
Maximum-26.9
Zeros0
Zeros (%)0.0%
Memory size6.4 KiB

Quantile statistics

Minimum-50.8
5-th percentile-47.1
Q1-42.7
median-42
Q3-36.4
95-th percentile-34.6
Maximum-26.9
Range23.9
Interquartile range (IQR)6.3

Descriptive statistics

Standard deviation4.515199327
Coefficient of variation (CV)-0.1108153084
Kurtosis-0.01263063113
Mean-40.74526699
Median Absolute Deviation (MAD)4.2
Skewness0.4983707218
Sum-33574.1
Variance20.38702496
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-42.718122.0%
 
-36.413716.6%
 
-46.210512.7%
 
-36.19211.2%
 
-41.88710.6%
 
-42718.6%
 
-47.1576.9%
 
-26.9111.3%
 
-31.4111.3%
 
-40.8101.2%
 
Other values (15)627.5%
 
ValueCountFrequency (%) 
-50.830.4%
 
-5030.4%
 
-49.520.2%
 
-47.1576.9%
 
-46.210512.7%
 
ValueCountFrequency (%) 
-26.9111.3%
 
-29.850.6%
 
-30.160.7%
 
-31.4111.3%
 
-3360.7%
 

nr.employed
Real number (ℝ≥0)

Distinct count10
Unique (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5171.220388349514
Minimum4963.6
Maximum5228.1
Zeros0
Zeros (%)0.0%
Memory size6.4 KiB

Quantile statistics

Minimum4963.6
5-th percentile5017.5
Q15099.1
median5195.8
Q35228.1
95-th percentile5228.1
Maximum5228.1
Range264.5
Interquartile range (IQR)129

Descriptive statistics

Standard deviation71.69256184
Coefficient of variation (CV)0.01386376067
Kurtosis0.2462926255
Mean5171.220388
Median Absolute Deviation (MAD)32.3
Skewness-1.152195025
Sum4261085.6
Variance5139.823423
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5228.136043.7%
 
5099.116520.0%
 
519113716.6%
 
5195.8759.1%
 
5076.2232.8%
 
5017.5222.7%
 
4991.6141.7%
 
4963.6131.6%
 
5008.791.1%
 
5023.560.7%
 
ValueCountFrequency (%) 
4963.6131.6%
 
4991.6141.7%
 
5008.791.1%
 
5017.5222.7%
 
5023.560.7%
 
ValueCountFrequency (%) 
5228.136043.7%
 
5195.8759.1%
 
519113716.6%
 
5099.116520.0%
 
5076.2232.8%
 

cluster
Boolean

CONSTANT
REJECTED

Distinct count1
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size6.4 KiB
1
824
ValueCountFrequency (%) 
1824100.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

df_indexagejobmaritaleducationdefaulthousingloancomm_typemonthday_of_weeklast_contact_durationcampaign_contact_countpoutcomecons.price.idxcons.conf.idxnr.employedcluster
03752technicianmarriedbasic.9ynoyesnotelephonemaymon16661nonexistent93.994-36.45191.01
17541blue-collardivorcedbasic.4yunknownyesnotelephonemaymon15751nonexistent93.994-36.45191.01
28849technicianmarriedbasic.9ynononotelephonemaymon14671nonexistent93.994-36.45191.01
316439servicesdivorcedhigh.schoolunknownnonotelephonemaymon20331nonexistent93.994-36.45191.01
419943blue-collarmarriedbasic.6ynoyesnotelephonemaymon10771nonexistent93.994-36.45191.01
538828unknownsingleunknownunknownyesyestelephonemaytue12011nonexistent93.994-36.45191.01
644642technicianmarriedprofessional.coursenononotelephonemaytue16231nonexistent93.994-36.45191.01
746942managementmarrieduniversity.degreenononotelephonemaytue16771nonexistent93.994-36.45191.01
855642blue-collarmarriedhigh.schoolnonoyestelephonemaytue12973nonexistent93.994-36.45191.01
959032technicianmarriedprofessional.coursenononotelephonemaytue19063nonexistent93.994-36.45191.01

Last rows

df_indexagejobmaritaleducationdefaulthousingloancomm_typemonthday_of_weeklast_contact_durationcampaign_contact_countpoutcomecons.price.idxcons.conf.idxnr.employedcluster
8144067245unemployedmarriedprofessional.courseunknownnonotelephonesepthu14051failure94.199-37.54963.61
8154073060retiredmarriedhigh.schoolnononocellularsepwed16401nonexistent94.199-37.54963.61
8164076436technicianmarrieduniversity.degreenounknownunknowncellularsepthu13342nonexistent94.199-37.54963.61
8174083630studentsingleprofessional.coursenoyesnocellularsepmon16164success94.199-37.54963.61
8184083832admin.marriedhigh.schoolnoyesnocellularsepmon12981nonexistent94.199-37.54963.61
8194088028admin.singlehigh.schoolnononocellularoctwed12462nonexistent94.601-49.54963.61
8204097024admin.singleuniversity.degreenoyesnocellularoctfri11763success94.601-49.54963.61
8214112146admin.singleuniversity.degreenoyesnocellularnovtue11663failure94.767-50.84963.61
8224112336blue-collarsinglebasic.6ynononocellularnovtue15564nonexistent94.767-50.84963.61
8234116454admin.marriedprofessional.coursenononocellularnovtue18682success94.767-50.84963.61